Data Analysis by Python

seaborn

We can make beautiful graph easily in Python by seaborn.

This page is made for Graphical Analysis . I do not write about adjusting the colors and shapes.

Plot of Panda is good at to see many variables. But Stratified Graph is not easy in the Plot of Panda.

Common Code

This code set is needed before the code starting "sns".

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set()
df= pd.read_csv("Data.csv" , engine='python')

Short Passes to the Graphs

Graph to visualize all data
>>>Heatmap for All Variables

Graph to visualize many variables
>>>Compare 1-dimension distribution of all variables
>>>Line graph for all variables
>>>All pairs in all variables

Graph for the analysis the relationship 2 variables
>>>Basic line graph
>>>Stratified line graph
>>>Line graph with confidence interval
>>>Basic scatter plot
>>>Stratified scatter plot
>>>Regression line
>>>Joint plot
>>>2-dimension histgram using hexagon
>>>Density distribution

Histgram
>>>Basic histgram
>>>Stratified histgram
>>>Good range histgram

Graph for 1 Variable Analysis
>>>Graph for 1 Variable
>>>Graph for 1 variable using 1 categorical variable
>>>Graph for 1 variable using 2 categorical variables
>>>Graph for 1 variable using 3 categorical variables

Bar plot
>>>Basic bar plot
>>>Bar plot for statistics
>>>Frequency plot

Memo
>>>Size of graph
>>>Effect of the orders of data
>>>I cannot use the graph ?!

Difference of Data Type

Data Type Data Type

There are 2 types of data. Left graph is made using left type data and the method " Compare 1-dimension distribution of all variables ".

Right graph is made using right type data and the method " Graph for 1 variable using 1 categorical variable ".

They are very similar.
swarm plot swarm plot

Graph to visualize all data

Heatmap for All Variables

We can make heatmap for all variables. This method is not used for the categorical data.

sns.heatmap(df)
heatmap

Y2 is very high. Y5 and Y6 seems to be similar.

Change the range of color from 8 to 12.

sns.heatmap(df,vmin=8,vmax=12)
heatmap

Using normalization .

df2 = (df - df.mean())/df.std() # normalization
sns.heatmap(data = df2)
heatmap

Graph to visualize many variables

Graph to visualize many variables is used for the data type below.
data type

Compare 1-dimension distribution of all variables

sns.stripplot(data = df)
1-dimension scatter plot

sns.swarmplot(data = df)
swarm plot

sns.boxplot(data = df)
box plot

sns.violinplot(data = df)

violin plot

sns.pointplot(data = df)
Graph of average and confidence interval .
point plot

df.plot.hist(subplots=True)
I use Plot of Panda for histgram.
Histgram

I often use separated graph for histgram. But if distribution is separated clearly I draw in one graph.

df.plot.hist()
Histgram

Line graph for all variables

sns.lineplot(data = df)
line graph

For the separated graph, I use Plot of Panda.
line graph

All pairs in all variables

sns.pairplot(df)
pair plot

sns.pairplot(df, hue='C1')
pair plot

Correlation Analysis for Multi-Variable with heatmap.

sns.heatmap(data = df.corr(), annot=True)# correlation matrix
pair plot

Graph for the analysis the relationship 2 variables

Basic line graph

sns.lineplot(data=df, x='X1', y='Y1',marker='o')
OR
sns.relplot(data=df, x='X1', y='Y1', kind='line',marker='o')
line graph data of line graph

Stratified line graph

sns.lineplot(data=df, x='X1', y='Y1',marker='o',hue='C1')
OR
sns.relplot(data=df, x='X1', y='Y1', kind='line',marker='o',hue='C1'
line graph data of line graph

sns.relplot(data=df, x='X1', y='Y1', kind='line',marker='o',col='C1')
line graph

sns.relplot(data=df, x='X1', y='Y1', kind='line',marker='o',row='C1')
line graph

Line graph with confidence interval

sns.relplot(data=df, x='X1', y='Y1', kind='line',marker='o')
line graph line graph

Basic scatter plot

We can change the size of size of graph only for scatterplot.

sns.scatterplot(data=df, x='X1', y='Y1')
OR
sns.relplot(data=df, x='X1', y='Y1',kind='scatter')
OR
sns.lmplot(data=df, x='X1', y='Y1', fit_reg=False)
scatter plot

Stratified scatter plot

sns.scatterplot(data=df, x='X1', y='Y1', hue='C1')
OR
sns.relplot(data=df, x='X1', y='Y1',kind='scatter', hue='C1')
OR
sns.lmplot(data=df, x='X1', y='Y1', fit_reg=False, hue='C1')
scatter plot

sns.relplot(data=df, x='X1', y='Y1',kind='scatter', hue='C1', col='C2')
OR
sns.lmplot(data=df, x='X1', y='Y1', fit_reg=False, hue='C1', col='C2')
scatter plot

sns.relplot(data=df, x='X1', y='Y1',kind='scatter', hue='C1', col='C3', row='C2')
OR
sns.lmplot(data=df, x='X1', y='Y1', fit_reg=False, hue='C1', col='C3', row='C2')
scatter plot

Regression line

sns.lmplot(data=df, x='X1', y='Y1', hue='C1', col='C2',row='C3',fit_reg=True)
Regression line

Joint plot

sns.jointplot(data = df, x='X1', y='Y1')
Joint plot

2-dimension histgram using hexagon

sns.jointplot(data = df, x='X1', y='Y1', kind="hex")
2-dimension histgram using hexagon

Density distribution

sns.jointplot(data = df, x='X1', y='Y1', kind="kde")
Density distributionのデータ

Histgram

Histgram is for the right type data.
data type data type

Basic histgram

df.hist('Y1')
Histgram

Stratified histgram

sns.FacetGrid(df,col='C1').map(plt.hist,'Y1'))
Histgram

sns.FacetGrid(df,row='C2',col='C1').map(plt.hist,'Y1'))
Histgram

Good range histgram

df.hist('Y1',bins=30,range=(0,300))
Histgram

Graph for 1 Variable Analysis

Data type for below.
data type

Graph for 1 Variable

We cannot change the size of graph when we use catplot.

sns.stripplot(data = df, y='Y1')
OR
sns.catplot(data = df, y='Y1', kind='strip', jitter=False)
pair plot

sns.stripplot(data = df, y='Y1', jitter=True)
OR
sns.catplot(data = df, y='Y1', kind='strip', jitter=True)
pair plot

sns.swarmplot(data = df, y='Y1')
OR
sns.catplot(data = df, y='Y1', kind='swarm')
swarm plot

sns.boxplot(data = df, y='Y1')
OR
sns.catplot(data = df, y='Y1', kind='box')
box plot

sns.violinplot(data = df, y='Y1', inner="quartile")
OR
sns.catplot(data = df, y='Y1', kind='violin', inner="quartile")
violin plot

sns.pointplot(data = df, y='Y1')
OR
sns.catplot(data = df, y='Y1')
point plot

Graph for 1 variable using 1 categorical variable

sns.stripplot(data = df, x='C1', y='Y1')
OR
sns.catplot(data = df, x='C1', y='Y1', kind='strip', jitter=False)
1-dimension scatter plot

sns.stripplot(data = df, x='C1', y='Y1', jitter=True)
OR
sns.catplot(data = df, x='C1', y='Y1', kind='strip', jitter=True)
1-dimension scatter plot

sns.swarmplot(data = df, x='C1', y='Y1')
OR
sns.catplot(data = df, x='C1', y='Y1', kind='swarm')
swarm plot

sns.boxplot(data = df, x='C1', y='Y1')
OR
sns.catplot(data = df, x='C1', y='Y1', kind='box')
box plot

sns.violinplot(data = df, x='C1', y='Y1', inner="quartile")
OR
sns.catplot(data = df, x='C1', y='Y1', kind='violin', inner="quartile")
ヴァイオリンのデータ

sns.pointplot(data = df, x='C1', y='Y1')
OR
sns.catplot(data = df, x='C1', y='Y1')
point plot

Graph for 1 variable using 2 categorical variables

sns.stripplot(data = df, x='C1', y='Y1', hue='C2', jitter=False, dodge=True)
OR
sns.catplot(data = df, x='C1', y='Y1', hue='C2', kind='strip', jitter=False, dodge=True)
1-dimension scatter plot

sns.stripplot(data = df, x='C1', y='Y1', hue='C2', jitter=True, dodge=True)
OR
sns.catplot(data = df, x='C1', y='Y1', hue='C2', kind='strip', jitter=True, dodge=True)
1-dimension scatter plot

sns.swarmplot(data = df, x='C1', y='Y1', hue='C2', dodge=True)
OR
sns.catplot(data = df, x='C1', y='Y1', hue='C2', kind='swarm', dodge=True)
swarm plot

sns.boxplot(data = df, x='C1', y='Y1', hue='C2')
OR
sns.catplot(data = df, x='C1', y='Y1', hue='C2', kind='box')
box plot

sns.violinplot(data = df, x='C1', y='Y1', hue='C2', split=True, inner="quartile")
OR
sns.catplot(data = df, x='C1', y='Y1', hue='C2', kind='violin', split=True, inner="quartile")
violin plot

sns.pointplot(data = df, x='C1', y='Y1',hue ='C2', dodge=True)
OR
sns.catplot(data = df, x='C1', y='Y1',hue ='C2', dodge=True)
point plot

Graph for 1 variable using 3 categorical variables

sns.catplot(data = df, x='C1', y='Y1', col='C3', hue='C2',kind='box')
box plot

sns.catplot(data = df, x='C1', y='Y1', col='C3', row='C2',kind='box')
box plot

For many categories

sns.catplot(data = df, x='C3', y='Y1', col='C1', hue='C2',kind='box')
box plot

sns.catplot(data = df, x='C3', y='Y1', col='C1', hue='C2',kind='box',col_wrap = 3)
box plot

Bar plot

Basic bar plot

sns.barplot(data = df, x='C1', y='Y1', hue='C2')
OR
sns.catplot(data = df, x='C1', y='Y1', hue='C2',kind='bar')
Bar plot Bar plot

Bar plot for statistics

When there are some values for same category data, length is the average of them. And confidense range also appers.

sns.barplot(data = df, x='C1', y='Y1', hue='C2')
OR
sns.catplot(data = df, x='C1', y='Y1', hue='C2', kind='bar')
Bar plot

Average and confidense range could be changed.

sns.barplot(data = df, x='C1', y='Y1', hue='C2', ci='sd', estimator=max) # max and standard deviation
OR
sns.catplot(data = df, x='C1', y='Y1', hue='C2', kind='bar', ci='sd', estimator=max) # max and standard deviation
Bar plot

Frequency plot

sns.countplot(data = df, x='C1', hue='C2')
OR
sns.catplot(data = df, x='C1', hue='C2',kind='count')
Bar plot Bar plot

Memo

Size of graph

plt.figure(figsize=(3,3))
sns.swarmplot(data = df, x='C1', y='Y1')
Left is default size. Right is made by (3,3)
Size of graph Size of graph

"plt.figure(figsize=(3,3))" is not effect for pairplot jointplot and catplot.

Effect of the orders of data

When I make graphs in this page, I changed the order of categorical variables to make the left graph. If I do not change the order the graph is the right.
swarm plot swarm plot

I cannot use the graph ?!

At first time, I want to use lineplotm replot and catplot, I could not use the function.

Because the version was 0.8.0
To make this page, I use 0.10.0
To examine the version, I used the code
print(sns.__version__)

To update the version I used Anaconda Prompt and wrote
pip install seaborn -U



Reference

https://seaborn.pydata.org/index.html